Categorical data squashing by combining factor levels

نویسندگان

  • Petros Dellaportas
  • Claudia Tarantola
چکیده

We deal with contingency table data that are used to examine the relationships between a set of categorical variables or factors. We assume that such relationships can be adequately described by the conditional independent structure imposed by an undirected graphical model. If the contingency table is large, a desirable simplified interpretation can be achieved by combining some categories, or levels, of the factors. We introduce conditions under which such an operation does not alter the Markov properties of the graph. Implementation of these conditions leads to likelihood ratio tests and Bayesian model uncertainty procedures based on reversible jump MCMC. The methodology is illustrated with reference to a 2× 3× 4 contingency table.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه یک الگوریتم خوشه بندی برای داده های دسته ای با ترکیب معیارها

Clustering is one of the main techniques in data mining. Clustering is a process that classifies data set into groups. In clustering, the data in a cluster are the closest to each other and the data in two different clusters have the most difference. Clustering algorithms are divided into two categories according to the type of data: Clustering algorithms for numerical data and clustering algor...

متن کامل

Career-Path Analysis Using Optimal Matching and Self-Organizing Maps

This paper is devoted to the analysis of career paths and employability. The state-of-the-art on this topic is rather poor in methodologies. Some authors propose distances well adapted to the data, but are limiting their analysis to hierarchical clustering. Other authors apply sophisticated methods, but only after paying the price of transforming the categorical data into continuous, via a fact...

متن کامل

Unsupervised Learning of Mixtures of Multiple Causes in Binary Data

This paper presents a formulation for unsupervised learning of clusters reflecting multiple causal structure in binary data. Unlike the standard mixture model, a multiple cause model accounts for observed data by combining assertions from many hidden causes, each of which can pertain to varying degree to any subset of the observable dimensions. A crucial issue is the mixing-function for combini...

متن کامل

مدل رگرسیون لجستیک چند حالته با مقادیر گم شده و کاربرد آن در بررسی بیماری گواتر

In large–scale sampling opeartions (e.g. nation-wide health surveys) we always face the problem of non-response item(s) and/or non-response unit(s). In fitting a model to the data we have two groups of variables, namely dependent and independent variables. Non-response may occur for any of these groups of variables. In this paper we assume Y as a categorical dependent variable with three levels...

متن کامل

On a general class of non-squashing partitions

We define M-sequence non-squashing partitions, which specialize to m-ary partitions Sloane, among others), factorial partitions, and numerous other general partition families of interest. We establish an exact formula, various combinatorial interpretations, as well as the asymptotic growth of M-sequence non-squashing partition functions, functions whose associated generating functions are non-m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003